Version: V11

VTT Loader Node

The VTT Loader node retrieves WebVTT subtitle files from URLs and parses them into structured caption objects with timing information. It automatically converts VTT timestamps to milliseconds for programmatic processing. The parsed captions are returned as an array of objects containing start time, end time, and text content.

How It Works

When the node executes, it downloads the VTT file from the specified URL, parses the WebVTT format to extract individual captions with their timing information, and converts timestamps from the standard VTT time format (HH:MM:SS.mmm) into milliseconds. Each caption in the VTT file becomes a separate object in the output array, preserving sequential order and timing information from the original file.

The node supports both simple timestamps (MM:SS.mmm) and full timestamps (HH:MM:SS.mmm), converting both formats to millisecond precision for consistency. Downloaded files are temporarily stored during processing and automatically cleaned up after parsing completes.

The output is an array of caption objects where each object contains three properties: startTime (integer in milliseconds), endTime (integer in milliseconds), and text (string content). This structured format is compatible with various downstream operations such as text analysis, caption editing, timestamp manipulation, or conversion to other subtitle formats.

Configuration Parameters

Input Field

Input Field (Text, Required): Workflow variable containing the VTT URL.

The URL must start with http:// or https:// and point to a valid VTT file. Variable interpolation using ${variable_name} syntax supports dynamic URL construction. The VTT file should follow the WebVTT specification with properly formatted timestamps and caption text.

Common patterns: https://storage.example.com/subtitles/video123.vtt, ${subtitle_url}, https://cdn.example.com/captions/${video_id}.vtt.

Output Field

Output Field (Text, Required): Workflow variable where parsed caption objects are stored.

The output is an array of caption objects with three properties per object: startTime (milliseconds), endTime (milliseconds), and text (string content). The array preserves sequential order from the VTT file.

Example output structure:

[
  {"startTime": 0, "endTime": 2500, "text": "Welcome to the video."},
  {"startTime": 2500, "endTime": 5000, "text": "This is the second caption."}
]

Common naming patterns: vtt_captions, subtitle_data, caption_list, parsed_captions.

Common Parameters

This node supports common parameters shared across workflow nodes, including Stream Output Response, Streaming Messages, and Logging Mode. For detailed information, see Common Parameters.

Best Practices

Validate VTT URLs are accessible and return valid WebVTT content before processing
Implement error handling using conditional nodes to gracefully handle missing or malformed files
Variable interpolation for dynamic URLs based on video IDs enables reusable workflows without hardcoding
Verify VTT files contain expected caption structure and timing information before passing to downstream nodes
For video synchronization, ensure timestamps match the video's actual timing to prevent caption misalignment
Descriptive variable names like video_subtitles improve workflow maintainability over generic names

Limitations

URL-only support: The node only supports loading VTT files from URLs (HTTP/HTTPS). Local file paths are not supported.
No authentication headers: Custom HTTP headers for authentication are not supported. Credentials must be included in the URL as query parameters, or the endpoint must be publicly accessible.
Download timeout range: Download timeout is configurable between 10 seconds and 5 minutes (10,000-300,000ms). Very large VTT files or slow connections may exceed the maximum timeout.
No format validation: The node does not validate WebVTT format compliance beyond basic parsing. Malformed VTT files may cause parsing errors or produce incomplete caption data.
Timestamp precision: Timestamps are converted to milliseconds with millisecond precision. Sub-millisecond timing information is not preserved.
No styling information: The node extracts only caption text and timing information. WebVTT styling, positioning, and formatting cues are not preserved.

How It Works​

Configuration Parameters​

Input Field​

Output Field​

Common Parameters​

Best Practices​

Limitations​

Related Articles​